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O 

,_^ We show how to construct the best hnear unbiased predictor (BLUP) 

f~^ for the continuation of a curve in a spline-function model. We assume 

fNj that the entire curve is drawn from some smooth random process and 

►^^ that the curve is given up to some cut point. We demonstrate how 

r^ to compute the BLUP efficiently. Confidence bands for the BLUP 

*^ are discussed. Finally, we apply the proposed BLUP to real-world 

^H call center data. Specifically, we forecast the continuation of both the 

,_^ call arrival counts and the workload process at the call center of a 

,-H commercial bank. 

'~j 1. Introduction. Many data sets consist of a finite number of multi- 

"^ dimensional observations, where each of these observations is sampled from 

"j^ some underlying smoothed curve. In such cases it can be advantageous to 

"t^ address the observations as functional data rather than as multiple series of 

'~~' data points. This approach was found useful, for example, in noise reduction, 

^_H missing data handling, and in producing robust estimations (see the books 

>> Ramsay and Silverman, 2002, 2005, for a comprehensive treatment of func- 

tional data analysis). In this work we consider the problem of forecasting 
00 the continuation of a curve using functional data techniques. 

The problem we consider here is relevant to longitudinal data sets, in 
lO which each observation consists of a series of measurements over time that 

describe an underlying curve. Examples of such curves are growth curves of 
different individuals and arrival rates of calls to a call center or of patients 
to an emergency room during different days. We assume that such curves, 
or measurement series that approximate these curves, were collected previ- 
ously. We would like to estimate the continuation of a new curve given its 
C^ beginning, using the behavior of the previously collected curves. 

Although each observation consists of a finite number of points, the ob- 
servation can be thought of as a smooth function. This dual representation 
leads to two different approaches when attempting to solve the prediction 
problem. In the discrete approach, each observation is a longitudinal vector 
of length p + q. We are interested in the prediction of the last g-length part 
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2 GOLDBERG ET AL. 

of the new observation, given its beginning p- length part. This can be com- 
puted by treating the beginning p-length vector as the predictor variables 
and the last g-length vector as the response variables. A prediction can be 
found, for example, by finding the best linear unbiased predictor (see (5)). 
The disadvantage of the discrete approach is that the smooth nature of the 
underlying function is ignored. If, instead, the continuous approach is used, 
the prediction problem might be treated naively using regression techniques 
in which both the predictor and the response are functions (Ramsay and 
Silverman, 2005, Chapter 16). However, these techniques do not take into 
account the fact that the response function is a smooth continuation of the 
predictor function. 

In this paper, we choose the continuous approach. Specifically, we would 
like to generalize the discrete case to the continuous one, taking the smooth 
nature of the curves into account. There are three main points that need to 
be addressed. First, the curves lie within an infinite dimensional space, while 
the number of observed curves is finite. This indicates that a simple model 
for description of the data is required. Second, the full-length curves, the 
curve beginnings, and the curve continuations all lie in different functional 
spaces, which, in contrast to the discrete case, cannot generally be related 
by a linear projection. Third, we require that the prediction should be a 
smooth continuation of the beginning of the curve (at least in the absence 
of noise) . 

Forecasting of the continuation of a function was considered in previous 
works. Besse, Cardot and Stephenson (2000), and Antoniadis, Paparodi- 
tis and Sapatinas (2006), among others, developed different techniques for 
curve- valued autoregressive processes. In these models, each curve describes 
some longitudinal data cycle such as climate variation during a year (Besse, 
Cardot and Stephenson, 2000) and television audience rates during a day 
(Antoniadis, Paparoditis and Sapatinas, 2006). These models assume that 
the cycles behave according to some autoregressive model. The aim of these 
works is to predict the next cycles given past observed cycles. The conti- 
nuity point at the beginning of each cycle, if it exists, is usually not taken 
into account. The model discussed in Shen (2009) is closer to ours. Shen 
discusses a curved-valued time series model in which past curves were pre- 
viously observed, and the beginning part of a new curve is given. Shen first 
forecasts the new curve entirely, and then updates this forecast based on 
the given curve beginning using penalized least squares. However, all the 
models discussed above assume some time series behavior, while the model 
discussed here assumes that the curve- valued observations are independent. 

The forecasting of curve continuation suggested here is based on finding 
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THE BLUP FOR CONTINUATION OF A FUNCTION 3 

the best linear unbiased predictor (BLUP) (Robinson, 1991). We assume 
that the curves are governed by a small number of factors, possibly with 
additional noise. These factors determine the main variation between the 
different curves. The computation of the predictor is performed in two steps. 
First, the factors' coefficients are estimated from the beginning of the new 
curve, which is defined on the first part of the segment. Second, the predic- 
tion is obtained by computing the representation of the factors on the latter 
part of the segment. We prove that the resulting estimator is indeed the 
BLUP and that it is a smooth continuation of the beginning of the curve 
(at least in the absence of noise). 

The two-step procedure for obtaining the BLUP involves computation of 
the mean function on both partial segments, and of the covariance operator 
on both segments and between them, which can be computationally de- 
manding if not performed prudently. We approximate the curve data using 
a spline function space of (possibly large) finite-dimension (de Boor, 2001). 
More specifically, we represent the curves using appropriate B-splines bases. 
The use of splines is common in functional data analysis due to the simplicity 
of spline computation, and the ability of splines to approximate smooth func- 
tions. We take advantage of two more attributes of finite-dimensional spline 
functional spaces. First, the functional space restriction from the whole seg- 
ment to a partial segment (the beginning part or the latter part) has a natu- 
ral B-spline basis that has a lower number of elements. This solves collinear- 
ity problems which can render any projection on the partial segment basis 
instable. Second, the knot-insertion algorithm (see de Boor, 2001, Chapter 
11) ensures an efficient and stable way to compute the mean function and 
covariance operators on different partial segments. 

The proposed forecasting procedure yields a smooth curve which is the 
best linear unbiased prediction. Note, however, that the continuation part of 
the function is random, and therefore requires confidence bands. We present 
confidence bands for the prediction, following Knafl, Sacks and Ylvisaker 
(1985), under the assumption that the curves aries from a Gaussian pro- 
cess. The bands are computed in two steps. First, confidence intervals are 
computed simultaneously for a finite set of points. Then, using the fact that 
splines are piecewise polynomials, a global band is found. We also suggest 
a way to compute confidence bands using cross-validation. While no theo- 
retical justification proof is given for the cross validation confidence bands, 
they are much faster to compute, and the numerical examples in Section 5 
show that this approach works considerably well. 

We apply the forecasting procedure suggested here to call center data. 
We forecast the continuation of two processes: the arrival process and the 
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4 GOLDBERG ET AL. 

workload process (i.e., the amount of work in the system; see, for example, 
Aldor-Noiman, Feigin and Mandelbaum, 2009). In call centers, the forecast 
of the arrival process plays an important roll in determining staffing lev- 
els. Optimization of the latter is important since salaries account for about 
60-70% of the cost of running a call center (Cans, Koole and Mandelbaum, 
2003). Usually, call center managers utilize forecasts of the arrival process 
and knowledge of service times, along with some understanding of customer 
patience characteristics (Zeltyn, 2005), to estimate future workload and de- 
termine staffing level (Aldor-Noiman, Feigin and Mandelbaum, 2009). The 
disadvantage of this approach is that the forecast of the workload is not per- 
formed directly, and instead it is obtained using the forecast of the arrival 
process. Reich (2010) showed how the workload process can be estimated 
explicitly, thereby enabling direct forecast of the workload. In this work we 
forecast the continuation of both the arrival and workload processes, given 
past days' information and the information up to some time of the day. We 
compare between the results for the arrival process and the workload pro- 
cess. We also compare our results for the arrival process to those of other 
forecasting techniques, namely, to the techniques that were introduced by 
Weinberg, Brown and Stroud (2007) and Shen and Huang (2008). 

The paper is organized as follows. The functional model and notation are 
presented in Section 2. The main theoretical results are presented in Sec- 
tion 3, were we first show how to construct the BLUP for the continuation 
of a curve. Next, we show how the BLUP can be computed efficiently. Confi- 
dence bands are discussed in Section 4. In Section 5 we apply the estimator 
to real-world data, comparing direct and indirect workload forecasting, and 
comparing our results to other techniques. Concluding remarks appear in 
Section 6. Technical proofs are provided in the Appendix. 

2. The functional model. In this section we present the model and 
notation that will be used throughout this paper. Let X be a random func- 
tion defined on the segment S = [0, T], and let the random functions Xi and 
X2 be the restrictions of X to the segments 5*1 = [0, U] and S2 = [U, T], re- 
spectively, for some < U < T. Our goal is to estimate X2 given information 
regarding Xi . 

We assume that X is of the form 

x{t) = fi{t) + ci){tyh, 

where fi{t) is the mean function, h = (hi, . . . , hp) is a random vector with 
mean zero and covariance matrix L, L is diagonal with Ln > . . . > Lpp > 
0, and (j){t) = ((/)i(t), . . . , 0p(t))' is a vector of orthonormal functions. We 
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THE BLUP FOR CONTINUATION OF A FUNCTION 5 

assume that the functions n and (pj have a basis expansion with respect to 
some B-sphne basis b = {bi, ...,1^)' , defined on some fixed knot sequence 
r. We denote this B-sphne space by Sk^r where k denotes the the sphnes' 
order. Thus, we can write iJ,{t) = h{t)' ^i and <j){t) = A'b(t), for some p x 1 
vector fi and N x p loading matrix A. Thus, we have 

(1) x{t) = b{ty{tx + Ah) = b{tyx, 

where x = fx + Ah. We think of N, the ambient functional space dimension, 
as being much larger then p, the dimension of the subspace which spanned 
by the random function X. 

We assume that instead of seeing X, we actually observe some noisy 
version of X, namely 

Y{t) = X{t)+e{t), 

where e{t) = xl){ty e is some random function independent of X{t), e is a 
q X \ zero- mean random vector with diagonal covariance matrix S, and xj) 
is a vector of functions. Since X{t) is a (random) linear combination of 
0i(t), . . . , (j)p{t), we consider the noise as the part of the observed function 
Y{t) that cannot be explained using such linear combinations. Hence we 
assume that i/? is orthogonal to (j). However, note that this orthogonality is 
not necessarily preserved when tp and (j) are restricted to one of the segments 
5*1 or 5*2. We assume that tp also has an expansion with respect to the basis 
b and hence ip{t) = B'b{t) for some N x q loading matrix B. Using this 
notation we may write 

(2) Y{t) = b{ty{tx + Ah + Be). 

The covariance functions u{s, t) = Cov(X {s) , X (t)) and v{s, t) = Cov{Y{s),Y{t)) 
canhewrittenhy b{sy {ALA')b{t) = b{sygb{t) and b{sy {ALA' +BT.B')b{t) = 
b{syGb{t), respectively. We define the correspondence covariance operators 
from Sk^T to itself for functions / G S^^t as 



(7/)(t) = / u{s,t)f{s)ds = b{tygWf 
Js 

(r/)(t) = f vis,t)fis)ds = bityGWf 
Js 

where W = Jg fo(s)6(s)'ds, and / is the expansion of the function / in the 
B-spline basis. 

We now introduce the notation for Xi and X2 and their respective noisy 
versions Yi and Y2. Let ri and T2 be knot sequences that agree with r on 
the segments [0, U) and ([/, T], respectively, and have knot multiplicity of k 
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6 GOLDBERG ET AL. 

at U. Let Sk,Ti for z = 1, 2 be the /c-ordered spline space with knot sequence 
Tj and let bi{t) = (6ji(t), . . . , biiy-{t)) be its corresponding B-spline basis. We 
wish to represent Xj and Yi (i = 1,2) using the representations of X and Y. 
Note that when the functions fJ-{t), 4>j{t), ipjit), v{s,t) and u{s,t) are 
known on [0,T], they are also known on Si and 82- Thus, it is enough to 
represent these functions using the bases bj in order to obtain representations 
for Xi and Yi. Recall that ^(i) = h[t)' ^, for some vector of coefficients fi. 
Using the knot-insertion algorithm (see de Boor, 2001, Chapter 11) we obtain 
new vectors /Xj such that (a) ^{t) = bj(t)'/Xj for all t on which 6j is defined 
and (b) jJLi is obtained from /x by truncation and a change of at most k 
coefficients. Similarly, using the knot-insertion algorithm, we can obtain the 
loading matrices Ai and Bi such that cj){t) = Aibi{t) and ip{t) = Bihi{t) for 
all t on which bi is defined. Summarizing, we have 

(3) Xi{s) = bi{s)'{^li + A^h)= bi{s)'xi 

Yi{s) = bi{s)'{iii + Aih + Bie)=bi{s)'yi 
v{s,t) = bi{s){AiLA'^ + BiT.B'^)bj{t) = bi{s)'G,jbj{t) 
u{s,t) = bi{s){AiLA'j)bj{t) = bi{sygijbj{t) 

for i,j = 1, 2 and for each s G Si and t £ Sj. 

We define the operators 7jj and Fjj from S^^t to 5'^.^-. for i,j = 1, 2 by 

(4) h^Jfm = f u{s,t)f{s)ds = bi{t)'g,,W,f 

JSj 

(r,,/)(i) = / v{s,t)f{s)ds = bi{tyGijWjf, 

JSj 

where Wj = Jg bj{s)bj{syds, and / is the expansion of the function / in 
bj. 

The model discussed above will be used for the estimation of X2 given 
Yi. Note that the distributions of X and Y are generally not known. In a 
realistic situation one needs to estimate the model components. Recall that 
Y{t) = b{ty[fi + Ah + Be), where h and e are random vectors with zero 
mean and covariance matrices L and S, respectively. Before discussing the 
forecasting procedure, we briefiy discuss how estimation of /x, L, S and the 
loading matrices A and B can be performed. 

Assume that the functions Y^^' , . . . ,Y^"^' were drawn according to the 
distribution law of Y. We distinguish between two scenarios. In the first 
scenario we assume that the functions Y^^' , . . . , y '"■) were observed. In this 
case one can estimate the various components of Y using functional principal 
component analysis (functional PCA). This can be done either by using 
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THE BLUP FOR CONTINUATION OF A FUNCTION 7 

PC A on the coefficients of the functions or by introducing some smoothness 
using regularized functional PCA (see, for example, Ramsay and Silverman, 
2005, Chapters 8 and 9). The matrices L and S are then determined by 
the eigenvalues of the PCA decomposition while the loading matrices A and 
B consist of the coefficients of the principal components with respect to 
the basis h. The size of L and S can be estimated using the gaps in the 
eigenvalues of the PCA decomposition. 

In the second scenario, we assume that some noisy discrete observations 
are given; for example in the following form 

for i = 1, . . . ,m, j = 1, . . . ,nj, and < tn < . . . < tin ■ < T , and where 
Cij ~ -/V(0, 0"^) are independent. In this case, one can first estimate the func- 
tions and then use functional PCA as described above. The simplest way 
to estimate the functions is to estimate each function separately, using, for 
example, regression splines (de Boor, 2001, Chapter 14). This method is 
used in the numerical examples in Section 5. Others, such as Kneip (1994) 
and Besse, Cardot and Ferraty (1997), suggest to estimate all the functions 
simultaneously. Both methods use some sort of functional PCA. These meth- 
ods suggest ways to estimate the length of h. The method by Besse, Cardot 
and Ferraty (1997) also assumes a splines environment, as in our case. 

3. The construction of the BLUP. Given Yi, the noisy version of 
the beginning part of the random function X , our goal is to find a good 
estimator for X2 , the continuation of Xi . 

Following Robinson (1991), we say that X2 is a good estimator of X2 given 
Yi if the following criteria hold: 

(CI) X2 is a linear function of Yi. 
(C2) X2 is unbiased, i.e., E[X2it)] = /i(t). 

(C3) X2 has minimum mean square error among the class of linear unbiased 
estimators. 

Two more demands regarding the estimator that seems desirable in our 
context are 

(C4) The random function X2 lies in the space Sk^j-2- 

(C5) When no noise is introduced, i.e., when Yi = Xi, the concatenation of 
X2 to Xi lies in S^^r'^ in other words, the combined function 

^^ r Xi(t) o<t<u 

\ X2{t) U <t<T 
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is smooth enough. 



An estimator that fulfills (C1)-(C5) will be referred to as a best linear unbi- 
ased predictor (BLUP). In this section we will show how to construct such 
a BLUP and prove that is is defined uniquely. 

Remark 3.1. Note that the definition of unbiased estimator in (C2) is 
not the usual definition. A more restrictive criterion is 

(C2*) X2 is unbiased in the the following sense E[X2it)\Yi] = E[X2{t)\Yi]. 

We will show that when Y is a Gaussian process, this criterion is fulfilled 
by the proposed BLUP as well. 

Remark 3.2. The analogous results in the multivariate case are well 
known. Here best estimator means estimator that meets criteria (C1)-(C3). 
Let Z = (Zi, Z2)' be a random vector such that 



E 



Zi 
Z2 



Var 



Zx 
Z2 



R 



Rii R12 
R21 R22 



\ rn2 I 

Then the BL UP of Z2 \ Z\ is given by 

(5) Z2 = 'm2 ^ R2\Rt\{Z\ - mi) 

where Rf^ is the Moore-Penrose pseudoinverse of Ru (see, for example, 
Marsaglia, 1964)- 

In the following, we define the linear operators that are the analogs of the 
matrices Rf^ and R21 from the multivariate case. This enables the construc- 
tion of a uniquely-defined BLUP for X2. 

We begin with defining the operator Tf^ : Sk^n — ^ Sk^n, which is the 
functional equivalent of Rf^ . Define the function 

v+^{s,t) = &i(s)'VFf iG+ VFf ibi(t) , 

for every s,t £ Si. Note that Wi is invertible since it is a Gram matrix of 
basis functions (see Sansone, 1991, Theorem 1.5). Define 

(r+/)(t)= / vti{s,t)f{s)ds = b^{tyw^'Gtif, 

JSi 

where / is the expansion of the function / in the B-spline basis &i. The 
following lemma justifies the notation of Vf^ as a pseudoinverse operator. 
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THE BLUP FOR CONTINUATION OF A FUNCTION 9 

Lemma 3.3. With probability one, 

riir+ (Fi - /x) = r+ rii(yi -i^) = Yi-fi. 

See proof in the Appendix. 

We are now ready to define tlie estimator for X2 given Yi, similarly to 
estimator (5) in the multivariate case, by 

(6) Mt) = Kt) + l2irti{Yi - ^l)it) = 62(t)'(/x2 + 521G+ (yi - Ml)) , 
for every t £ 82- Then we have 

Theorem 3.4. The estimator X2 meets criteria (C1)-(C5) and is unique 
up to equivalence. Moreover, if Y is a Gaussian process, then X2 meets 
criterion (C2*) as well. 

Proof. We show that (C1)-(C5) hold, one by one. 

(CI) holds because X2 is indeed a linear transformation of Yi as can be 
seen from (6). 
(C2) holds since 

E[X2{t)] = &2(i)'(M2 +521^+ (i?[yi - Ml])) = &2(t)V2 = Kt) ■ 

(C3) states that X2 should minimize the mean square error among all the 
unbiased linear estimators . Let X2 be another linear unbiased estimator. 
Then we can write X2 = (X2— X2)+X2. Since both X2 and X2 are unbiased, 
X2 — X2 is an unbiased linear estimator of zero, hence it is of the form 
b2{t)'M{yx — fii) for some N2 x Ni matrix M. Moreover, it can be shown 
that Cov(X2 - X2, X2 - X2) = 0. Indeed, 

C0Y{{X2-X2){s),iX2-X2){t)) = E[{X2 - X2){X2 - X2m] 

= b2{syE[{x2 - M2)(2/l - /Xl)']M'b2(t) 

-b2{syE[fi2 + 92iGti{yi - t^i){yi - Mi)']M'62W 
= b2(s)'(52iM' + <72iG+ GiiM'))62(i) = . 

where the last equality follows from Lemma 3.3. 

To see that X2 minimizes the mean square error, note that 

E[iX2 - X2f{t)] = E[{X2 - X2f{t)] + E[{X2 - X2f{t)] + 2E[{X2 - X2)(X2 - X2){t)] 
(7) = E[{X2 - X2f{t)\ + E[{X2 - X2f{t)\ > E[{X2 - X2f{t)] , 

which proves that X2 minimizes the mean square error and is unique up to 
equivalence. 
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(C4) holds by construction. 

(C5) states that when no noise is introduced, X2 a smooth continuation 
of Xi. First, note that by Lemma 3.3 

Xi(t) = &i(t)'(/xi + GiiG+ (:ei-Mi)) = 6i(t)'(/xi+yli(L^;G+ )(a;i-/2i)) . 

By definition we also have 

X2(t) = &2(t)'(/X2 + 52iG+ (X1-/21)) = b2(t)'(/^2 + A2(LA;G+)(a;i-/2i)) . 

Define X{t) = b{ty{fi{t) + A{LA[Gi^){xi - /xi)). It follows from the defi- 
nitions of fXi, Ai and bi that X{t) agrees with Xi on Si and with X2 on 5*2. 
Since X G Sk,T, the result follows. 

Finally, if y is a Gaussian process, then yi and X2 are normally dis- 
tributed such that Var(yi) = Gu and Cov(a;2, yi) = 521- Following Marsaglia 
(1964) we obtain 

(8) E[X2{t)\Yi] = b{tyE[x2\yi] = b{ty{ti2 + g2iGti{yi-tJ-i)) 

= X2{t) = E[X2it)\Yi] 
and criterion (C2*) is met. D 

It should be noted that when the parameters of the model are estimated 
(see end of Section 2) and a Gaussian model is assumed, the estimator X2 
can be considered as an empirical Bayes estimator. Indeed, the estimation 
of the distribution of h and e can be considered as estimating the prior 
distribution, while the the computation of X2 as in (8) is in fact finding the 
posterior mean given the data Yi. 

From a computational point of view, the computation of X2 may seem 
heavy. Indeed by (6) it involves finding the pseudoinverse of Gf^ which is an 
A'^i X A^i matrix. However, a simpler expression can be found. Recall that 



Gu = [Ai,Bi] 



' L ' 


' A[' 


S 


_B[ 


" L ' 

s 


. Usini 



CSG' . 



Using Lemma A. 1.3 with T 



where G = [^i,-Bi] and 5 
5^2^' we have 

G+ = GS^'^Us^'^G'GS^'^y^^ S^'^G' 

= GS^'^ (s-^/^{G'G)+S-\G'G)+S-^'^) S^'^G' 
= G{G'G)+S-^{G'G)+G' , 
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THE BLUP FOR CONTINUATION OF A FUNCTION 11 

which involves the pseudoinverse computation of a {p + q) x (p + q) matrix. 
Finally, instead of assuming that Yi{t) = &i(t)'(//i + Aih + i?ie), one 
may assume that 

yi(t) = bi(t)'(/2i + Ai/i + ei) 

where ei is a A'^i x 1 mean zero random vector with a^I covariance matrix 
and I is the identity matrix. In this case, 

(9) X2{t) = b2{ty{fi2+g2i{AiLA[ + a^iyHxi - /n)) 

which is the ridge regression estimator (Hoerl and Kennard, 1970). Once 
again, a simpler expression can be obtained using some matrix algebra (see 
Robinson, 1991, Eq. 5.2). We have 



521 



{AiLA[+a^I)-^ = A2LA[{AiLA[+a^I)-^ = A2 (^A[Ai + ct^L"^) ^ A[ 



and hence X2{t) = 62(i)'(M2 + ^2 (^1^1 + a'^L'^) ^ A[{xi - /ii)), which 
involves only the inverse of a p x p matrix. 

4. Confidence Bands. In Section 3 we suggested the estimator X2 
for the continuation of the function Xi. In this section we would like to 
construct confidence bands for this estimator. We consider two kinds of 
confidence bands. The first is a global confidence band. A global confidence 
band with confidence level (1 — 5)100% is defined as a pair of functions, 
the upper band fu and the lower band /l, such that P^fi^t) < X2{t) < 
fu{t) for all t G S2) >! — 5. We also consider local confidence bands. Local 
confidence bands do not require that the last condition holds simultaneously 
for all t; rather we are looking for a pair of functions gu and gi such that 
for aU t G ^2, P{gL{t) < X2{t) < gu{t)) >l-5. 

Our construction of both global and local confidence bands is based on 
the technique introduced by Knafi, Sacks and Ylvisaker (1985). The idea 
is the following. We first create simultaneous confidence intervals for some 
finite set of point. Then, using the attributes of spline functions, we com- 
plete this band for all points of 5*2. The computation of these bands can be 
computationally demanding. Hence, we suggest also confidence bands that 
are based on cross-validation. While these confidence bands do not have the 
theoretical guarantee of the former, they are simple to compute and seem 
to work reasonably well (see Section 5, Table 4). 

In the following, we assume that X and Y are Gaussian processes. There- 
fore X2 is also a Gaussian process and, by (8), £'[X2|yi] = X2- Similarly, we 
have 

Cow{X2{s),X2{t)\Yi) = 62(5)' [922 - GaiG+i G12) b^it) . 
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Define 

^ ' Var(X2(t)|yi)V2 ' 

then Z{t) is a zero-mean Gaussian process with variance 1 for each t. 

Let ti,...,tm be the breaks in r2, i.e., the knots of r2, ignoring knot 
multiplicity. Let tjj = tj + |^(ti+i — ij), j = 1,...,A; — 1. Define the 
following grid 

G = {^1,17 ^1,2, • • • , *m-l,fc-l) *m} , 

i.e., ^ is a grid that includes all the breaks in T2 and there are k — 2 equally 
spaced grid points between each two successive breaks of T2. We are inter- 
ested in computing simultaneous confidence intervals for the points in ^. In 
other words, for a given 5, we would like to find zs such that 
(10) 
P(max|X2(t) -X2(t)| > Z5Var(X2(t)|yi)^/2) = P(max|Z(i)| > zs) < 5 . 

zs can be found using simulations or by utilizing the inequality (Knafl, Sacks 
and Ylvisaker, 1985, Eq. (1.8) ) 

(11) 

m— 1 fc— 1 

P(max |Z(t)| > a) < P{\Z{ti,i)\ > a)+ ^ ^ P{\Z{k,j)\ < a , |Z(i,j+i)| > a) 
^"^^ i=i j=i 

Recall that the trajectories of (X2{t) — X2)\Yi are in Sk.T2- Hence for each 
segment between two successive breaks of T2, say [ij,tj+i], the trajectories 
are A:-ordered polynomials. Let p{t) be a restriction of such a trajectory to 
[tj,ii+i]. p{t) can be written, using Lagrange polynomials, as 

k k 

p{t) = Y.mp{ti,j) ; ij{t)= n 7^^^- 

Note that for ah t £ [tj,ti+i], \p{t)\ < Y.r=i I^(OIp(*m)- Hence, if 

(12) \p{Uj)\ < zsYar{X2{ti,j)\Yiy/^) forj = 1, . . . , fc , 

then for all t £ [ti, ti+i] 

k 

(13) \pit)\ <zsJ2 \iJit)\Y8.riX2iU,,)\Y^)'/') = zsDtM ■ 

i=i 

By (10) we have that with probability greater than or equal to 1 — S, the in- 
equality in (12) holds simultaneously for all i. Thus, with probability greater 
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than or equal to 1 — 5, the inequahty in (13) also holds. Define the pair of 
functions (/{/,/l) on 52 such that for all t G [tj,tj_|_i] 

(14) fu{t) = X2{t) + zsDtM ; fL{t)=X2{t)-zsDtM- 

Then (/f/j/i) are global confidence band for X2|yi with a confidence level 
greater than or equal to 100(1 — 5)%. Note that fu and fi are continuous. 
For local confidence bands, we can define the pair of functions {gu^Qh) 
on S2 such that for all t G [ij,tj-(_i] 

(15) gu{t) = X2{t) + ZiDt^it) ; gdt) = X2{t) - zsDtAt) , 

where 

zs = maxmin <.z^:P[ max |Z(tjj)| > 25 I < <5 > • 

Using zs ensures that gu and gL are continuous. The estimation of zs can be 
done using the relation in (11). We note that in the computation of zs we 
demanded that between each two successive breaks in T2, with probability 
greater than 1 — 5 the trajectories of X2 will stay within the band. While 
this can be restrictive if the distance between successive points in T2 is large, 
a simple solution is to take the set Q to be more dense. 

We remark here on some issues related to the confidence bands defined 
in (14-15). First, note that the bands are conservative, meaning that the con- 
fidence level is greater than 100(1— (5)%. Second, we have assumed that X2I Yi 
is a Gaussian process with known distribution. Third, the computation of zs 
(or Zs) can be demanding. Hence, we suggest to estimate confidence bands 
from the data using some sort of cross-validation. Compute Var(X2(t)|Yi)^'^ 
for all t € Q, and let D{t) be the /c-ordered regression spline function with 
knot sequence T2 of the points {(t, Var(X2(t)|yi)^'^) : t G Q}. We suggest 
the following confidence bands 

(16) fuit) = X2it) + CGioh.iDit) ; guit) = X2it) + CLoc.\Dit) , 

and similarly for f^ and gi where Cciobai and CLocai are computed using 
cross-validation as described below. Assume that the functions Y^^' , . . . , y'™) 
were observed. Partition the functions to K folds Fj : j = 1, . . . ,K. Compute 
X2{t) and D{t) for each subset of K — 1 folds. Define 

Cciobalj = min i c> : ^ 5] I{\Yi{t) - M^)] < cD{t) foralU e G} > I - 6 
ClocIJ = min <j : min I ^ ^ I{\Y,{t) - Mt)] < cD{t)} ) > 1 " 4 



*^^ W\y..f, 
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14 GOLDBERG ET AL. 

where I{B} is the indicator function of the set B. Then we suggest to choose 
Cciobai and CLocai to be the median of Cciobaij' and CLocai ,j respectively. We 
note that the suggestion to extend the confidence bands from points in the 
grid to the whole segment using regression splines seems reasonable when 
the grid is fine enough. In the numerical examples of Section 5 we compute 
the confidence bands using the cross-validation technique. 

5. Numerical Examples. In this section we apply the estimator X2 
to call center data. We are interested in forecasting the continuation of two 
processes: the arrival process and the workload process. The estimators of 
these two processes play an important roll in determining staffing level at 
call centers (see, for example, Aldor-Noiman, Feigin and Mandelbaum, 2009; 
Reich, 2010; Shen and Huang, 2008). Usually, staffing levels are determined 
in advance, at least one day ahead. Here we propose a method for updating 
the staffing level, given information obtained throughout the beginning of 
the day. As noted by Gans, Koole and Mandelbaum (2003) and by Shen 
and Huang (2008), such updating is operationally beneficial and feasible. 
If performed appropriately, it could result in higher efficiency and service 
quality: based on the revised forecasts, a manager can adjust staffing levels 
correspondingly, by offering overtime to agents on duty or dismissing agents 
early, calling in additional agents if needed, increasing or reducing cross- 
selling, and transferring agents to other activities such as email inquiries 
and faxes. 

This section is organized as follows. We first describe the arrival and work- 
load processes (Section 5.1). We then describe the data (Section 5.2) and the 
forecast implementation (Section 5.3). The analysis appears in Sections 5.4- 
5.6. Finally, confidence bands are discussed in Section 5.7. 

5.1. The arrival and workload processes. We define the arrival process 
of day j, aj{t), as the number of calls that arrive on day j during the time 
interval [t — c,t], where t varies continuously over time and c is some fixed 
constant. Note that aj{t) itself is not a continuous function, but when the 
call volume is large and this function does not change drastically over short 
time intervals, it can be assumed that the function aj{t), for each day j, 
arises from some underlying deterministic smooth arrival rate function X{t) 
plus some noise (Weinberg, Brown and Stroud, 2007). In this case aj(t)/c 
can be considered as an approximation of the smooth function A(t). We 
now describe the workload process u)j{t) for each day j. The function Wj{t) 
counts the number of calls that would have been handled by the call center 
on day j at time t, assuming an unlimited number of agents and hence no 
abandonments. From a management point of view, the advantage of looking 
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at 'Wj{t) over looking at aj{t) is that Wj{t) reflects the number of agents 
actually needed at each point in time. However, as opposed to the process 
aj{t), which is observable in real time, the computation oiwj{t), for a specific 
time t, involves estimation of call durations for abandoned calls and can be 
performed only after all calls entered up to time t are actuality served (see the 
discussion at Aldor-Noiman, Feigin and Mandelbaum, 2009; Reich, 2010). 

5.2. The data. The data used for the forecasting examples were gathered 
at a call center of a large U.S. commercial bank. The bank has various types 
of operations such as retail banking, consumer lending, and private banking. 
Since the call arrival pattern varies over different types of services, we restrict 
attention to retail services, which account for approximately 70% of the calls 
(see Weinberg, Brown and Stroud, 2007). The first two examples are of the 
arrival process and the workload process, for weekdays between March and 
October 2003. The data for the first example consists of the arrival counts 
at five-minutes resolution between 7:00 AM and 9:05 PM (i.e., c = 5 in 
the definition of aj{t)). The data for the second example consists of average 
workload, also in five-minutes resolution, between 7:00 AM and 9:05 PM. 
There are 164 days in the data set after excluding some abnormal days such 
as holidays. Figure 1 shows arrival count profiles for different days of the 
week. 

The third example explores the arrival process during weekends between 
March and October 2003. There are 67 days in the data set (excluding one 
day with incomplete data). As can be seen from Figure 1, the weekend 
behavior is different from that of the working days, and there is a Saturday 
pattern and a Sunday pattern. The data for this example consists of the 
arrival counts at fifteen-minutes resolution between 8 AM and 5 PM. The 
change in interval length from the previous two examples is due to the 
decreased call-counts. The change in day length is due to the low activity in 
early morning and late afternoon hours on weekends (see Figure 1). 

In the first and second examples, we used the first 100 weekdays as the 
training set and the last 64 weekdays as the test set. For each day from 
day 101 to day 164, we extracted the same-weekday information from the 
preceding 100 days. Thus, for each day of the week we have about 20 training 
days. For the third example, the test set consists of weekend days 41 to 67 
while the training set for each day consist of its previous 40 weekend days. 
Thus, similarly, for each day we have about 20 training days. Additionally, 
we used the data from day start, up to 10 AM and up to 12 PM. All forecasts 
were evaluated using the data after 12 PM, which enabled fair comparison 
between the results of the different cut points (10 AM and 12 PM). We also 
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Fig 1. Arrival count in five-minutes resolution for six successive weeks, grouped according 
to weekday (Friday was omitted due to space constraints). There is a clear difference 
between workdays, Saturdays, and Sundays. For the working days, it seems that there is 
some common pattern. Between 7 AM and 10 AM the call count rises sharply to its peak. 
Then it decreases gradually until 4 PM. From 4 PM to 5 PM there is a rapid decrease 
followed by a more gradual decrease from 5 PM until 12 AM. The call counts are smaller 
for Saturday and much smaller for Sunday. Note also that the mam activity hours for 
weekends are 8 AM to 5 PM, as expected. 



compare our results to the mean of the preceding days, from 12 PM on. 

For a detailed description of the first example's data, the reader is referred 
to Weinberg, Brown and Stroud (2007), Section 2. For an explanation of 
how the second example's workload process was computed, the reader is 
referred to Reich (2010). The data for the third example was extracted 
using SEEStat, which is a software written at the Technion SEELab^. We 
refer the reader to Donin et al. (2006) for a detailed description of the U.S. 
commercial bank call-center data from which the data for all three examples 
was extracted. The U.S. bank call-center data is publicly downloaded from 
SEESLab server^. 

5.3. Forecast implementation. The forecast was performed by Matlab 
implementation of the BLUP algorithm from Section 3, where we enable 
regularization as in (9). For the implementation we used the functional data 



^SEELab: The Technion Laboratory for Service Enterprise Engineering. Webpage: 
http : //ie . technion . ac . il/Labs/Serveng 
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analysis Matlab library, written by Ramsay and Silverman^. The Matlab 
code, as well as the data sets, are downloadable (see Supplement A). The 
parameters for the forecast were chosen using 10-fold cross-validation (see 
end of Section 2). We computed local confidence bands with 95% confidence 
level using cross-validation, as described in (16). We quantified the results 
using both Root Mean Squared Error (RMSE) and Average Percent Error 
(APE), which are defined as follows. For each day j, let 

100^ \Njk-Njk\ 




RMSE,= [-J2{N,k-N,kf] ; APE, = —J2 



where Njk is the actual number of calls (mean workload) at the k-th time 
interval of day j in the arrival (workload) process application, Njk is the 
forecast of Nji^, and K is the number of intervals. 

5.4. First example: Arrival process for weekdays data. Forecasting the 
arrival process for the first example data was studied by both Weinberg, 
Brown and Stroud (2007) and Shen and Huang (2008). Weinberg, Brown and 
Stroud assumed that the day patterns behave according to an autoregressive 
model. The algorithm they suggest first gives a forecast for the current 
day based on previous days' data. The algorithm estimates the parameters 
in the autoregressive model using Bayesian techniques. An update for the 
continuation of the current day forecast is obtained by conditioning on the 
data of the current day up to the cut point. We refer to this algorithm as 
Bayesian update (BU) for short. Similarly, the algorithm by Shen and Huang 
assumes an autoregressive model and gives a forecast for the current day. 
They then update this forecast using least-square penalization, assuming an 
underlying discrete process. We will refer to this algorithm as penalized least 
square (PLS). 

Comparison between the results of all three algorithms for the first data 
set appears in Table 1. Note that for all of the algorithms and all of the cate- 
gories there is improvement in the 10 AM and 12 PM forecasts over the fore- 
cast based solely on past days. The RMSE mean decreases by about 5-13% 
for the 10 AM forecast, and by 12-15% for the 12 PM forecast, depending on 
the algorithm. It should be noted that the algorithms by Weinberg, Brown 
and Stroud and by Shen and Huang use information from all 100 previous 
days and the knowledge of the previous day call counts. In comparison, the 
BLUP algorithm uses only the same weekday information (~20 days) and 



^The functional data analysis Matlab library can be download form ftp: //ego. psych, 
mcgill . ca/pub/ramsay/FDAf uns/Matlab/ 
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the previous day information is not part of its training set. Nevertheless, the 
results are similar. 

The forecasting results for the week that follows Labor Day appear in Fig- 
ure 2. It can be seen that for the Tuesday that follows Labor Day (Monday) 
the call counts are much higher than usual. This is captured, to some degree, 
by the 10 AM forecast and much better by the 12 PM forecast. The same 
phenomenon occurs, with less strength, during the Wednesday and Thurs- 
day following Labor Day, until on Friday all the forecasts become roughly 
the same. It seems that the power of the continuation-of-curve forecasting 
is exactly in such situations, in which the call counts are substantially dif- 
ferent than usual throughout the day, due to either predictable events, such 
as holidays, or unpredictable events. 

5.5. Second example: Workload process for weekdays data. The second 
example consists of the workload process for weekdays data for the same 
period as the first example. We forecast the workload process based on 
these sets of data: previous days' data, up to 10 AM data, and up to 12 
PM data. We refer to this forecast as direct workload forecast since we use 
past workload estimation as the basis for the forecast. An alternative (and 
simpler) workload forecasting method was proposed by Aldor-Noiman, Feigin 
and Mandelbaum (2009). Aldor-Noiman, Feigin and Mandelbaum suggest 
to forecast the workload by multiplying the forecasted arrival rate by the 
estimated average service time (see Aldor-Noiman, Feigin and Mandelbaum, 
2009, Eq. 21). We refer to this method as indirect workload forecasting. 

Comparison between the two methods appears in Table 2. Following 
Aldor-Noiman, Feigin and Mandelbaum (2009), we estimated the average 
service time over a 30 minute period for indirect workload computations. 



Example 1 


Previous day 
mean 




10 AM 






12 PM 




RMSE 


BU 


PLS 


BLUP 


BU 


PLS 


BLUP 


Minimum 


12.46 


11.08 




11.51 


11.07 




11.51 


Ql 


14.11 


14.00 


13.31 


13.51 


13.56 


13.33 


13.27 


Median 


16.40 


15.50 


14.87 


14.69 


14.80 


14.60 


14.17 


Mean 


19.11 


17.86 


16.48 


16.83 


16.59 


16.13 


16.15 


Q3 


21.27 


19.87 


17.26 


17.04 


16.58 


16.39 


15.92 


Maximum 


68.93 


57.72 




52.09 


53.66 




51.03 



Table 1. Summary of statistics (m,inim,um,, lower quartile (Ql), median, mean, upper 
quartile (Q3), maximum) of RMSE for the forecast based on the mean of the previous days, 
and BU, PLS, and BLUP using data up to 10 AM and up to 12 PM for the call arrival 
data set. The results for BU and PLS were taken from the original papers. No maximum 
and minimum results were given for PLS. 
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Note that the direct workload forecast results are slightly better than the 
indirect workload forecast in most of the categories. Also note that in almost 
all categories, there is an improvement in the 10 AM and 12 PM forecasts 
over the forecast based solely on past days. The RMSE mean decreases by 
about 11% (9%) for the 10 AM forecast, and by 15% (12%) for the 12 PM 
forecast for the direct (indirect) forecast. Figure 3 presents a visual com- 
parison between the direct and the indirect forecast methods on a specific 
day. The two forecasts look roughly the same, which is also true for all other 
days in this data set. 

While in this example there is no significant difference between the direct 
and indirect workload forecasts, we expect these methods to obtain different 
forecasts when the arrival rate changes during an average service time. This 



400 



Tuesday 



Wednesday 




12 16 20 

Thursday 



400 



B 12 16 20 

Friday 




Fig 2. Forecasting results for the week following Labor Day (Sept. 2-5, 2003) for the 
call arrival process of the first example. Labor Day itself (Monday) does not appear since 
holiday data is not included m the data set. The black dots represent the true call counts 
in five-minutes resolution. The forecasts based on previous days, 10 AM data, and 12 PM 
data are represented by the blue, red, and green lines, respectively. 
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is true, for example, for arrival and service of patients in emergency rooms. 
The arrival rates of patients to emergency rooms can change within an hour 
while the time that a patient spends in emergency room (the "service time" ) 
is typically on the order of hours. As pointed out by Rozenshmidt (2008, 
Section 6), in such cases, forecasting the workload by the arrival count mul- 
tiplied by the average service time may not be accurate. This is because the 
number of customers in the system is cumulative, while the arrival count 
counts only those who arrive in the current time interval. Thus, if the ar- 
rival count is lower than it was in the previous time interval and the average 
service time is long, the workload is underestimated. Similarly, if the arrival 
count is larger than previously, the workload is overestimated. 

5.6. Third example: Arrival process for weekends data. The third exam- 
ple it that of the weekends' arrivals. The main difference between the first 
two examples and this one is that the data in this example cannot be con- 
sidered as data from successive days, due the six day difference between 
any Sunday and it successive Saturday. Recall that the models considered 
by Weinberg, Brown and Stroud (2007) and Shen and Huang (2008) have 
an autoregressive structure. Since this autoregressive structure seems not to 
hold for this data, the techniques by Weinberg, Brown and Stroud and Shen 
and Huang are not directly applicable. But even when the autoregressive 
structure does not hold, the results appearing in Table 3 reveal that fore- 
casting for this data set is still beneficial. Indeed, the RMSE (APE) mean 
decreases by about 5% (2%) for the 10 AM forecast, and by 10% (4%) for 
the 12 PM forecast. While these results are not as good as the results in 
the previous examples, note that the curves in this example begin an hour 
later and while the call counts are lower during weekends, the arrival rate 



Example 2 


Day ahead 


10 AM 


12 PM 


RMSE 


Workload 


Workload 


Workload 


Workload 


Workload 


Workload 




(indirect) 


(direct) 


(indirect) 


(direct) 


(indirect) 


(direct) 


Minimum 


8.72 


8.41 


7.98 


T.71 


7.96 


8.50 


Ql 


10.76 


10.58 


10.21 


10.27 


10.21 


10.11 


Median 


12.10 


12.26 


11.63 


11.21 


11.66 


11.05 


Mean 


15.97 


15.95 


14.59 


14.26 


14.13 


13.48 


Q3 


15.08 


15.21 


14.53 


14.20 


13.89 


12.76 


Maximum 


96.09 


94.79 


95.74 


85.11 


93.39 


71.20 



Table 2. Summary of statistics (m,inim,um, lower quartile (Ql), median, mean, upper 
quartile (Q3), maximum,) of RMSE for the forecast based on the mean of the previous 
days' data, up to 10 AM data and up to 12 PM data, for the workload data set, for both 
the indirect and the direct forecast methods using the BL UP. 
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Fig 3. Workload forecasting for Friday, September 5, 2003, using both the direct and the 
indirect methods. The black curve represents the workload process estimated after observing 
the data gathered throughout the day. The blue and red curves represents the workload 
forecasts for the indirect and direct forecasts, respectively, given data up to 12 PM. 

variance does not change drastically (see Figure 1). 



5.7. Confidence hands. Following Weinberg, Brown and Stroud (2007), 
we introduce the 95% confidence band coverage (COVER) and the average 
95% confidence band width (WIDTH). Specifically, for each day j, let 

. K . K 

COVER, = ^ E ^ (FL,jk < Njk < Fu,jk) ; WIDTH, = ^ E (^t/jfc - i^Ljfc) , 



K 



k=l 



K 



fc=i 



Example 3 




RMSE 






APE 






Day ahead 


10 AM 


12 PM 


Day ahead 


10 AM 


12 PM 


Minimum 


3.66 


3.62 


3.92 


4.47 


4.33 


4.60 


Ql 


5.37 


5.63 


5.05 


5.57 


5.41 


5.64 


Median 


6.80 


7.01 


6.87 


6.71 


6.84 


6.31 


Mean 


7.64 


7.19 


6.97 


7.23 


7.10 


6.97 


Q3 


9.01 


8.97 


8.59 


8.83 


8.16 


7.44 


Maximum 


16.12 


11.84 


11.13 


12.17 


11.80 


12.46 



Table 3. Summary of statistics (minimum, lower quartile (Ql), median, mean, upper 
quartile (Q3), maximum) of RMSE and APE for the forecast based on the mean of the 
previous days and the BLUP, using 10 AM and 12 PM cuts for the weekends data set. 
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where (F^jk, Fu,jk) is the confidence band of day j, evaluated at the begin- 
ning of the /c-th interval (see (16)). The mean coverage and mean width, for 
all three examples, are presented in Table 4. First, note that for all three 
examples, the width of the confidence band narrows down as more informa- 
tion is revealed. In other words, the width of the confidence band for the 
12 PM forecast is narrower than the width for the 10 AM forecast which, 
in turn, is narrower than the width for the pervious days' mean. We also 
see that the mean coverage becomes more accurate as more information is 
revealed. Figure 4 depicts the confidence bands for the arrival process on 
a particular Sunday. Note that the confidence bands for the previous days' 
forecast and the 10 AM forecast almost coincide. However, at 12 PM, when 
enough information on this particular day becomes available, the confidence 
band narrows down and does capture the underlying behavior. 



Confidence bands 




9 10 11 12 13 14 



Fig 4. Confidence bands for Sunday, August 10, 2003. The black dots represent the true call 
counts in fifteen-mmutes resolution. The confidence bands based on previous days, 10 AM 
data, and 12 PM data are represented by the blue, red, and green lines, respectively. 
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Example 1 


Example 2 


Example 3 


Example 1 


Example 2 


Example 3 


Mean 
10 AM 
12 PM 


93.19% 
94.14% 
94.86% 


91.69% 
92.27% 
93.08% 


98.15% 
98.64% 
96.49% 


79.73 
74.99 
73.07 


62.80 
56.45 
55.95 


40.15 
39.53 
31.40 



Table 4. The mean confidence band coverage and the mean width for the forecasts based 
on the previous days' mean, the 10 AM cut and the 12 PM cut for the arrival process on 
the working days data set (Example 1), the workload process on the working days data set 
(Example 2) and the arrival process on the weekends data set (Example 3). 
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Summarizing, using call center data, we demonstrated that forecasting of 
curve continuation can be achieved successfully by the proposed BLUP. We 
also showed that confidence bands for such forecasts can be obtained using 
cross-validation. 

6. Concluding Remarks. We have constructed the best linear unbi- 
ased predictor (BLUP) for the continuation of a curve. We now add the 
following comments regarding the construction of the BLUP and its appli- 
cation to call center data. 

First, in our analysis we have used a spline model to describe the func- 
tions. This is not required for the construction of the BLUP, and the proof 
of Theorem 3.4 holds for any function space of finite dimension. However, 
as discussed previously, there are two main advantages of using spline rep- 
resentation. First, the computation of the covariance operators, for 81,82 
and 8 and between them, as well as the pseudo-inverse covariance operator 
r^^, are all computationally simple when working with splines. Second, the 
representation of the restriction of a function to a partial segment does not 
suffer from collinearity of the basis functions, which can be the case for a 
more general setting. Indeed, the number of basis elements can be reduced 
significantly in the spline function model, depending on the number of knots 
in the partial segment, while the number of basis elements could remain the 
same in a more general model. 

Second, we have assumed that the random function X lies within a func- 
tion space of (possibly large) finite-dimension. While this is a restrictive 
assumption, there are some difficulties with the BLUP definition (and com- 
putation) for a random function that lies in an infinite-dimensional space. 
The main difficulty is that inverting the covariance operator (as done in 
Lemma 3.3 for finite dimension) is problematic since the inverse of the co- 
variance operator need not be bounded and may not exists. However, we 
believe that characterization of the BLUP in the infinite-dimension case is 
possible under some conditions. Further research is required to address this 
question. 

Finally, in this work we forecasted the continuation of the workload pro- 
cess. As discussed in Feldman et al. (2008) and Reich (2010), the workload 
process is a more appropriate candidate than the arrival process, as a basis 
for determining staffing levels in call centers. This work, along with Aldor- 
Noiman, Feigin and Mandelbaum (2009) and Reich (2010), are the first steps 
in exploring direct forecasting of the workload process, but more remains to 
done (see, for example Whitt, 1999; Zeltyn et al, 2009). 
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APPENDIX A: PROOFS 
A.l. Lemma A.l. 

Lemma A.l. Let T be a n x p matrix of rank s and let L be a p x p 
positive definite diagonal matrix. Then the following assertions are true 

1 T'T(T'T)^T' = T' 

2. T'LT{T'LT)+T' = T' 

3. {T'T)+ = T' {{TT')+fT 

Proof. 1. If T'T is invertible then {T'T)+ = {T'Ty^ and the result 
follows. Otherwise, let UAV' be the svd (singular value decomposition, 
see Golub and Loan, 1983) of T where U and V are orthonormal 
columns matrices of size nx s and p x s respectively, and A is a s x s 
positive definite diagonal matrix. Then 

T'T{T'T)+T' = {VAU'){UAV'){{VAU'){UAV'))^VAU' 
= VA^V'{VA'^V')^VAU' . 

Since A is invertible and V has orthonormal columns (VA'^V')'^ = 
VA-^V. Hence 

T'T{T'TyT' = VA'^V'VA-'^V'VAU' = VAU' = T' . 

2. Denote W = L^/'^T, then T'LT{T'LT)+T' = W'W{W'W)+W'L-^/'^. 
Using 1., we obtain W'W{W'W)+W'L-'^/'^ = W'L~^^'^ = T . 

3. Since TT' is a positive semi-definite matrix, TT' has a spectral repre- 
sentation of the form TT' = Yfi=i Kviv'i where s < min{n,p}, Aj > 
and {vi} is an orthonormal set of vectors. Note that TT'vi = XiVi and 
hence T'TiT'vi) = XiT'vi. Moreover ||r'wj|| = v[TT'vi = v[[\iVi) = 
\i. Hence, we obtained that [T'vi/y/Xi] is the set of orthonormal 
eigenvectors of T'T with the respective non-zero eigenvalues {Aj}. 
Thus, 



Using the spectral representation we also have 



T' {{TT')+f T = T'(Y, Xr\v'i T . 



In order to show that (T'T)^ = T' {(TT')^) T we need to show the 
following (see Golub and Loan, 1983): 
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(a) (T'T) (r {{TT')+fT) [T'T) = (T'T) 



(b) (t' {{TT')+f t) {T'T){T'{{TT')+fT) = (T'{{TT')+fT 

(c) ({T'T) (r {{TT')+f t))' = {T'T){T' {{TT')+f T) 



(d) ((r'((rT')+)^T)(r'r)) = {r {{TT')+f t){t't) 

In order to see (a), note that 

{T'T) {t' {{TT')+f r) {T'T) = T'{TT') [ ^ Xfviv'A {TT')T 

= r (JZx^.v^ (JlK'v^^'^ (i:\v,v[^ T 

= T' {y^ Viv'AT = T'T. 
Similarly, for (b), we have 

{t' {{tt')+)^ tYt't) (t' {{TT')+)^T^ = 

= T'(±K\v'}j{TT')^(^±Xr\v'}jT 
= T' \y^ K^Viv'i J ( ^ XiViv'i J ( ^ Xi^Viv'i J T 
= T' (e \"'^*^A T = T' {{TT')+f T . 
For (c), 
T'T) {r {{TT')^f t)) = [t'{TT') {± \fv,v^ T^ = (t' {± Xr^v,v[^ T^ 

= T' Ij2 K^^'i'^'i] T = (T'T) (t' {{TT')+)'^t) . 

Finally, (d) is shown similarly to (c). 

D 
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A. 2. Proof of Lemma 3.3. 

Proof. By (2) we may write Yi(t) - fi{t) = bi{ty{Aih + Bie). Hence, 
(rnr+ {Aih + Bie)) (t) = 



h-i{t)'GiiWiW^^G'[^{Aih + Bie) 
bi{tyGuGtMih + Bie) 



bi(t)'[Ai,Si] 



L 

s 






[Ai,Bi 



L 

s 



B[ 



Aih 
Bie 



and the result follows from Lemma A.l. 

Substituting h = LA2 and e = in the last equation, we also obtain 



(17) 



G11G+ 512 = rnG+ (^iLA'2 + BiO) = 512 . 
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